Using Ontologies in Semantic Data Mining with SEGS and g-SEGS
نویسندگان
چکیده
With the expanding of the Semantic Web and the availability of numerous ontologies which provide domain background knowledge and semantic descriptors to the data, the amount of semantic data is rapidly growing. The data mining community is faced with a paradigm shift: instead of mining the abundance of empirical data supported by the background knowledge, the new challenge is to mine the abundance of knowledge encoded in domain ontologies, constrained by the heuristics computed from the empirical data collection. We address this challenge by an approach, named semantic data mining, where domain ontologies define the hypothesis search space, and the data is used as means of constraining and guiding the process of hypothesis search and evaluation. The use of prototype semantic data mining systems SEGS and g-SEGS is demonstrated in a simple semantic data mining scenario and in two reallife functional genomics scenarios of mining biological ontologies with the support of experimental microarray data.
منابع مشابه
TOWARDS SEMANTIC DATA MININGWITH g-SEGS
This paper introduces the term semantic data mining to denote a data mining approach where domain ontologies are used as background knowledge for data mining. It is motivated by successful applications of SEGS (search for enriched gene sets), a system that uses biological ontologies as background knowledge to construct descriptions of interesting gene sets in experimental microarray data. We ge...
متن کاملSemantic Subgroup Discovery Systems and Workflows in the SDM-Toolkit
This paper addresses semantic data mining, a new data mining paradigm in which ontologies are exploited in the process of data mining and knowledge discovery. This paradigm is introduced together with new semantic subgroup discovery systems SDM-search for enriched gene sets (SEGS) and SDM-Aleph. These systems are made publicly available in the new SDM-Toolkit for semantic data mining. The toolk...
متن کاملComprehensive assessment of exposures to elongate mineral particles in the taconite mining industry.
Since the 1970s, concerns have been raised about elevated rates of mesothelioma in the vicinity of the taconite mines in the Mesabi Iron Range. However, insufficient quantitative exposure data have hampered investigations of the relationship between cumulative exposures to elongate mineral particles (EMP) in taconite dust and adverse health effects. Specifically, no research on exposure to taco...
متن کاملSemantic Data Mining of Financial News Articles
Subgroup discovery aims at constructing symbolic rules that describe statistically interesting subsets of instances with a chosen property of interest. Semantic subgroup discovery extends standard subgroup discovery approaches by exploiting ontological concepts in rule construction. Compared to previously developed semantic data mining systems SDM-SEGS and SDM-Aleph, this paper presents a gener...
متن کاملTwo Novel DNAs That Enhance Symptoms and Overcome CMD2 Resistance to Cassava Mosaic Disease
UNLABELLED Cassava mosaic begomoviruses (CMBs) cause cassava mosaic disease (CMD) across Africa and the Indian subcontinent. Like all members of the geminivirus family, CMBs have small, circular single-stranded DNA genomes. We report here the discovery of two novel DNA sequences, designated SEGS-1 and SEGS-2 (forsequencesenhancinggeminivirussymptoms), that enhance symptoms and break resistance ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011